Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve popup performance #1487

Merged
merged 34 commits into from
Dec 24, 2024
Merged

Improve popup performance #1487

merged 34 commits into from
Dec 24, 2024

Conversation

djahandarie
Copy link
Collaborator

@djahandarie djahandarie commented Oct 13, 2024

Quite a bit of popup CPU usage comes from the following things:

  • Constant relayouts due to (intentional) progressive loading of entries within the popup
  • Media loading

The first is resolved easily by introducing content-visibility: auto, at the downside of the scrollbar being less accurate but I think it's the right tradeoff.

The second is resolved by the rest of the PR, which is insane and probably impossible to understand.

Changes to media loading

Previous to this PR, media requests were sent one-by-one from the popup to the backend, which would generate a data URL that would be sent back to the popup and inserted in an image tag.

I considered another approach, which batches all the image requests for a given entry, prepares canvas elements, and sends OffscreenCanvases to the backend, which then fetches the unique set of images for that entry and draws the appropriate ones onto the appropriate canvases.

This helps quite a bit since we can avoid creating Blobs and Data URIs, and also avoid duplicate requests to IndexedDB. IndexedDB requests are very slow here.

However, Firefox can't do cross-process transfers of OffscreenCanvases, so I settled on an architecture of having a worker within the popup which does drawing onto the OffscreenCanvases, and the backend worker transfers ArrayBuffers to the popup worker for drawing (because Firefox can do cross-process transfers of those).

Changes to communications

Unfortunately, implementing the above was not so straightforward, as we primarily use chrome.runtime.sendMessage which does not allow for transferring objects and serializes everything, making the postMessage transfers impossible.

So, I introduced a couple new communication channels:

  • The popup now does a serviceWorker postMessage to the active SW (on Chrome), or establishes a MessageChannel using a temporary SharedWorker and postMessages over that (on Firefox)
  • (On chrome) The offscreen now registers a MessageChannel with the SW, so the SW can postMessage over that channel to the offscreen. The offscreen watches for SW changes and reregisters so the SW will always have a channel for it.

This builds a full postMessage channel all the way from the popup to the backend (offscreen in case of Chrome), which allows for building further MessageChannels between popup and various backend workers (dictionary worker in the case of this PR). And with postMessage comes the ability to transfer things, which is critical for performance.

Potential improvements

  • Right now the way dictionary-database builds its worker is a bit ugly / overspecific to drawMedia. In the future we should probably consider some more generic architecture, where we have pools of dictionary workers, which can handle whatever dictionary-related work we want, and nothing DB-related is done in the main backend thread

@MarvNC
Copy link
Member

MarvNC commented Oct 13, 2024

I tried it and it seems a little better with 96 dictionaries enabled.

@MarvNC
Copy link
Member

MarvNC commented Oct 25, 2024

Tried the latest changes and it feels like a dramatic improvement. Great work!

@koiyakiya
Copy link

koiyakiya commented Oct 25, 2024

when i try to build this for chrome and import a dictionary collection file, it completely breaks yomitan and spits out a ton of errors, animations are broken etc
image
not sure if I'm just doing something wrong or there's a genuine problem

@4890A
Copy link

4890A commented Oct 26, 2024

Same issue with loading dict collections or importing single dicts on the PR. (No response/animations after selecting a file). Loading recommended dicts still works however. I was able to test the PR by loading my dict collection with the main dev branch first and then copying over the files from this PR. The performance boost is significant for me with 35 imported dictionaries.

@djahandarie
Copy link
Collaborator Author

djahandarie commented Oct 26, 2024

Yeah there's almost certainly a problem, I forget that the importer also runs some parts of this DB code in a worker and I changed some of the DB code to behave differently based on whether it's executing in a worker or not and neglected to check the behavior of the import code. I'll take a look. Thanks for testing!

@MarvNC
Copy link
Member

MarvNC commented Oct 27, 2024

Some images seem to be broken (pixiv png and kanji png here)
msedge_Yomitan_Search_-矜持を正す-_Yomitan_Search_2024-10-26_19-02-11

@djahandarie
Copy link
Collaborator Author

Thanks, is that the entry for 矜持?

@MarvNC
Copy link
Member

MarvNC commented Oct 27, 2024

yep

@MarvNC
Copy link
Member

MarvNC commented Oct 27, 2024

actually all images are broken so maybe it's just on my end.

msedge_Yomitan_Search_-りんご-_Yomitan_Search_2024-10-26_19-15-32

i just loaded this PR on top of my existing yomitan install.

@4890A
Copy link

4890A commented Oct 27, 2024

Seems like images break after a certain point. After restarting my browser pictures work, scan a bit and they dissapear again.

@djahandarie
Copy link
Collaborator Author

Can you guys open the service worker, as well as offscreen inspector tabs from the extension details page, and let me know if there are errors in either of those consoles?

@MarvNC
Copy link
Member

MarvNC commented Oct 27, 2024

no console errors present in either.

@djahandarie
Copy link
Collaborator Author

How about immediately after restarting the extension and doing the first lookup? It must be breaking somewhere 🤔

@4890A
Copy link

4890A commented Oct 27, 2024

Didn't see any console errors while it was broken, and now after restarting I can't reproduce the issue.

@MarvNC
Copy link
Member

MarvNC commented Nov 24, 2024

Just to leave a record, I've been using d3f1317 on Edge for the past few weeks and haven't had any issues. Although I haven't tried dictionary collection import which 4890 reported was broken.

github-merge-queue bot pushed a commit that referenced this pull request Dec 16, 2024
* Safe changes

* Fix some types

* Fix some types

* Remove api-map

* Remove deps

* remove wasm-unsafe-eval

* Revert manifest variants

* Add thing

* Types

* Revert db change
Copy link

codspeed-hq bot commented Dec 17, 2024

CodSpeed Performance Report

Merging #1487 will improve performances by 41.68%

Comparing performance-canvas (58c17c3) with master (120f0af)

Summary

⚡ 1 improvements
✅ 2 untouched benchmarks

Benchmarks breakdown

Benchmark master performance-canvas Change
Translator.prototype.findKanji - (n=3) 3.9 ms 2.8 ms +41.68%

@jamesmaa
Copy link
Collaborator

jamesmaa commented Dec 17, 2024

Dumb questions about this PR because I haven't really looked at it until today

  • Does "media loading" refer to when dictionaries have images? It seems like the performance enhancements improve general performance, or is that an edge case?
  • Can you explain at a high level what the CSS changes are and how they're related to the performance work? I'm not understanding the connection here
  • Do you have a recommended order of files to read in order to better understand this PR?

Copy link
Collaborator

@jamesmaa jamesmaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw you also have some leftover console.log and commented out code that needs cleaning up. Very cool stuff I'm seeing here and excited to try to get this out the door.

I'm not done with my review I'm just doing a hard cutoff for bed without being completely blocked on me

ext/js/dictionary/dictionary-database.js Show resolved Hide resolved
ext/js/dictionary/dictionary-database.js Show resolved Hide resolved
ext/js/dictionary/dictionary-database.js Outdated Show resolved Hide resolved
ext/js/dictionary/dictionary-database.js Outdated Show resolved Hide resolved
);

// when we are not a worker ourselves, create a worker which is basically just a wrapper around this class, which we can use to offload some functions to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "not a worker" equivalent to saying "main thread" or "main process"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, not exactly. On Chrome, we expect this to be running in the offscreen context, and in Firefox we expect it to be in the background page context. In theory it could also run in the SW context in Chrome though we don't do that currently since SW is short-lived and we want to keep the DB open longer. These contexts are neither threads or processes, as threads and processes are implementation details of the browser. (e.g., in Chrome each extension runs in its own process, and I'm not sure if SW and offscreen are in different processes, while on Firefox all extensions run in a single process, but any extension resources which are loaded as a child of a non-extension resource seems to be in the normal webpage process).

So "not a worker" is the most technically accurate way to capture all the possible situations here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to educate myself via ChatGPT. lmk if this explanation of the difference between offscreen, background, and sw context is wrong

https://chatgpt.com/share/67627dc8-294c-8002-a0a6-bb1344b88a74

ext/js/dictionary/dictionary-database.js Outdated Show resolved Hide resolved
// performance.mark('drawMedia:findMultiBulk:end');
// performance.measure('drawMedia:findMultiBulk', 'drawMedia:findMultiBulk:start', 'drawMedia:findMultiBulk:end');

// move all svgs to front to have a hotter loop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "hotter loop" mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPUs do stuff like branch prediction and caching, so if you can do the same sort of processing all at the same time instead of interleaving different types of processing, you can get some performance benefits since the same branch gets executed over and over, and all necessary information is in your CPU caches. It's basically getting the benefits of batching at the micro scale.

ext/js/dictionary/dictionary-database.js Outdated Show resolved Hide resolved
*
* # On application startup
* application: create a new MessageChannel, bind event listeners to one of the ports, and send the other port to the bridge
* ↓↓<"connectToBackend1" via SharedWorker.port.postMessage>↓↓
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectToBackend1 and connectToBackend2 seem like two different logic

connectToBackend1 is an API call that responds to a message from the application
connectToBackend2 is logic on the backend to store the port as state

Am I understanding this correctly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, that's not how I would describe it. Both of the messages serve just to pass a MessagePort from the application to the backend. The shared worker in the middle does nothing except just pass along the port. The backend stores the port.

/** @type {import('shared-worker').ApiHandler<'connectToBackend1'>} */
_onConnectToBackend1(_params, _interlocutorPort, ports) {
if (this._backendPort !== null) {
this._backendPort.postMessage(void 0, [ports[0]]); // connectToBackend2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of void 0, could we use an event type or some other data to indicate it's a connectToBackend2 message? Right now connectToBackend2's existence is only implied via comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally was doing that, but basically either you choose the fully typesafe API style of doing things like I did everywhere else, which requires a ton of extra code (like maybe 50~100 lines), but it seemed like overkill when there is only a single type of message which gets sent over this channel which has no (normal) parameters, so I decided to simplify it, since I can't imagine this ever getting more message types in the future (since the whole purpose of it is to establish a channel that you can use to send properly typed messages of any kind that you want). If you think it's too confusing I guess we can introduce the full API boilerplate, but we have a lot of different APIs due to this PR and I think introducing yet another one might be bad unless we figure out some better way to name them to make it clearer what's going on.

@djahandarie
Copy link
Collaborator Author

djahandarie commented Dec 17, 2024

@jamesmaa

  • Does "media loading" refer to when dictionaries have images? It seems like the performance enhancements improve general performance, or is that an edge case?

Yes, that's what it refers to. The performance enhancements improve general performance because almost all entries from recently created Yomitan dictionaries involve a high number of images, due to many dictionaries using tons of tiny SVGs for all sorts of different things (pitch accent markers, numbers, symbols, section tags, etc).

  • Can you explain at a high level what the CSS changes are and how they're related to the performance work? I'm not understanding the connection here

One CSS change (content-visibility: auto) directly improves the performance as I mentioned in the PR body, but the rest are just simplifications to match the new DOM which simply has a single canvas element now, instead of multiple layers deep of various DOM elements that we had previously.

  • Do you have a recommended order of files to read in order to better understand this PR?

Mmm, certainly ext/js/* changes are way more important than anything else. Within that folder, I guess something like this:

  • To understand the communication changes, application.js, comm/shared-worker-bridge.js, background/backend.js, background/offscreen.js
  • To understand the actual changes in media stuff, display/structured-content-generator.js, display/media-drawing-worker.js, dictionary/dictionary-database.js

and then the rest as-needed in whatever order you want.

@djahandarie djahandarie requested a review from jamesmaa December 17, 2024 12:15
@Kuuuube
Copy link
Member

Kuuuube commented Dec 17, 2024

Added some conflicts to this PR in #1677, you can merge this into the PR: #1678.

# This targets the branch of #1487

Fixes merge conflicts from #1677

History here looks a little funky but when it's actually merged into the
PR only 65ec09d and
e80b3c3 should show since master
already has the others.
Copy link
Collaborator

@jamesmaa jamesmaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cinderella moment ending my review at my local midnight. Mostly did a high level pass through.

ext/js/application.js Outdated Show resolved Hide resolved
@@ -190,10 +190,35 @@ export class Application extends EventDispatcher {
* @param {(application: Application) => (Promise<void>)} mainFunction
*/
static async main(waitForDom, mainFunction) {
/** @type {MessagePort | null} */
// If this is Firefox, we don't have a service worker and can't postMessage,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"we" here refers to the application a.k.a the iframe within the popup?

can't postMessage

How is it that we postMessage to the sharedWorkerBridge but not directly to the backend?

Copy link
Collaborator Author

@djahandarie djahandarie Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"we" here refers to the application a.k.a the iframe within the popup?

Yes.

How is it that we postMessage to the sharedWorkerBridge but not directly to the backend?

There is simply no way to get access to the backend Window object in order to postMessage it, which makes sense because direct access to Window would allow for cross-process DOM manipulation which certainly would not work. I looked through all available APIs but could not find any way to do it in a more straight-forward way so I had to use this hack (which I came up with myself btw lol).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terminology, is my understanding correct?

"application context" refers to the iframe that popups during scan and this file contains the relevant logic
"content script context" is the code that runs on each web page irrespective of whether a scan occurs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite -- Application is used both in the content script and the iframe. That's why we need the window.location.protocol === new URL(import.meta.url).protocol check to differentiate between them. It's possible that this should be done with some new parameter to Application and sending in different values from the callsite in order to change its behavior.

const sharedWorkerBridge = new SharedWorker(new URL('comm/shared-worker-bridge.js', import.meta.url), {type: 'module'});
const backendChannel = new MessageChannel();
sharedWorkerBridge.port.postMessage({action: 'connectToBackend1'}, [backendChannel.port1]);
sharedWorkerBridge.port.close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the understanding here that the application and backend have now established a connection and this sharedWorkerBridge is no longer needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, yes. (Technically, the other side of the MessageChannel may not have a listener on it yet at this point, so it's hard to say that the connection has been fully established, but that's fine as any messages sent on it will just get queued until the listener is added and the channel is started. Either way, the work that is required to be done with application.js to establish the connection is completed.)

ext/js/application.js Outdated Show resolved Hide resolved
ext/js/dictionary/dictionary-database-worker-main.js Outdated Show resolved Hide resolved
);

// when we are not a worker ourselves, create a worker which is basically just a wrapper around this class, which we can use to offload some functions to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to educate myself via ChatGPT. lmk if this explanation of the difference between offscreen, background, and sw context is wrong

https://chatgpt.com/share/67627dc8-294c-8002-a0a6-bb1344b88a74

export class MediaDrawingWorker {
constructor() {
/** @type {number} */
this._generation = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "generation" mean in the context of Yomitan?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generation is a fairly common term used when you have some sort of garbage collection going on -- and we do here, as we clear all the old canvases from memory when there is a new lookup occurring. This is required because the backend may still be processing old requests and sending some instructions to draw things on indexes for certain canvases from the previous lookup which could cause bugs if we didn't have this tracking of generations.

ext/js/display/structured-content-generator.js Outdated Show resolved Hide resolved
@djahandarie djahandarie requested a review from jamesmaa December 22, 2024 03:16
Copy link
Member

@Kuuuube Kuuuube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mostly looks fine and it works great in my testing. But it looks like there's a memory leak in the entry handling somewhere causing detached elements to be retained.

for (let i = 0, ii = dictionaryEntries.length; i < ii; ++i) {
let i = 0;
for (const dictionaryEntry of dictionaryEntries) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a for of loop while still tracking i feels like a bad idea. I don't see a reason to not keep the loop style the same. Could use this to fix up the funky looking definition with this if you dont like let i = 0, ii = dictionaryEntries.length:

let dictionaryEntriesLength = dictionaryEntries.length;
for (let i = 0; i < dictionaryEntriesLength; ++i) {

@Kuuuube
Copy link
Member

Kuuuube commented Dec 22, 2024

More info on the aforementioned memory leak:

I'm getting this due to the external link icon svg. It is added to structured content here https://github.com/yomidevs/yomitan/blob/performance-canvas/ext/js/display/structured-content-generator.js#L446-L448. Unsure if this occurs on when rendering all svgs or just this one. I don't have any other dicts with svgs to test.

If you need a test dict for this, Jitendex contains external links in (afaik) every entry so it will always render the external link svg. Testing this on the search page with てすと (it will leak if you press search multiple times) if you need an exact case to replicate but this should happen on any searches with Jitendex present.

Checking this the same way as I mentioned in my last pr about memory leaks:

To see the detached elements: pull up the search page > F12 > Memory > Profiles > Detached elements > Start

@djahandarie
Copy link
Collaborator Author

Interesting... I don't think this PR should have affected anything related to that but maybe there is some weird indirect relation to it. Will take a look.

@djahandarie
Copy link
Collaborator Author

@Kuuuube I can't seem to get the same result from the detached elements tool. It shows a bunch of .entry elements that are detached but nothing so specific as the external link icon SVG. (I do have jitendex loaded and it is rendering the external link icons.)

@Kuuuube
Copy link
Member

Kuuuube commented Dec 24, 2024

Yes, it does cause the entire entry to be retained. I just gave it a quick check through the code manually and removing the external link svg caused the entries to stop being retained. I'm not sure if it's specific to this svg or if all svgs will do this.

ext/js/display/display.js Outdated Show resolved Hide resolved
ext/js/display/display.js Outdated Show resolved Hide resolved
Copy link
Member

@Kuuuube Kuuuube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Tested on Chromium, Firefox, Firefox Android, and Kiwi. Works perfectly.

I think this is good to merge in soon. Can leave it for a bit incase anyone else has comments but jmaa has done a great job looking it over and I don't see anything alarming.

Copy link
Collaborator

@jamesmaa jamesmaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like this needs me to be in the blocking path anymore. I understand what's going on at a high level. I didn't really review the nitty-gritty of the media-drawing-worker logic. Kuuube's thorough testing is clutch here to unblocking merging this PR.

One thing of note: it does seem like this PR does cause a performance regression in the Playwright tests. I don't think this should be blocking per-se, but it's worth investigating since it's going from 2m -> 7m

https://github.com/yomidevs/yomitan/actions/runs/12476495471/job/34821384451


// ImageDecoder is slightly faster than Blob/createImageBitmap, but
// 1) it is not available in Firefox <133
// 2) it is available in Firefox >=133, but it's not possible to transfer VideoFrames cross-process
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever need to transfer VideoFrame?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to use ImageDecoder, yes, as it returns a VideoFrame (the reason its VideoFrame is because of GIF support). And ImageDecoder would be ideal as it can be a simple unified and performant interface for doing this.

But we'll need to wait for Firefox to support cross-process transfer of more things with postMessage, which I don't expect to come any time soon 😿

@djahandarie djahandarie added this pull request to the merge queue Dec 24, 2024
Merged via the queue into master with commit 3142fa9 Dec 24, 2024
11 checks passed
@djahandarie djahandarie deleted the performance-canvas branch December 24, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance The issue or PR is related to performance kind/enhancement The issue or PR is a new feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants